Simplified and cleaner code #183

SKopecz · 2025-12-29T13:00:14Z

All MPRK and SSPMPRK schemes are build upon a basic Patankar step with varying Patankar matrices, destruction vectors and denominators. This PR introduces serveral functions (in-place and out-of-place) to make the current implementations more readable and simplify the implementation of new schemes in #107.

Out-of-place implementations

In-place implementations

codecov · 2025-12-29T13:57:08Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

coveralls · 2025-12-29T14:18:49Z

Pull Request Test Coverage Report for Build 21143246237

Details

262 of 262 (100.0%) changed or added relevant lines in 3 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage decreased (-0.3%) to 97.493%

Totals
Change from base Build 20641694759:	-0.3%
Covered Lines:	1789
Relevant Lines:	1835

💛 - Coveralls

SKopecz · 2026-01-04T10:56:37Z

Introduced a function to compute linear combinations for out-of-place computations. It is implemented recursively, but internally unrolled.

using BenchmarkTools, StaticArrays 

# --- Base cases (End of recursion) ---

# For PDSFunctions (with destruction vectors)
@inline lincomb(c1::Number, P1, d1::AbstractArray) = (c1 .* P1, c1 .* d1)

# For ConservativePDSFunctions (without destruction vectors)
@inline lincomb(c1::Number, P1, d1::Nothing) = (c1 .* P1, nothing)


# --- Recursive steps ---

# For PDSFunctions: Processes triplets (coeff, P, d)
@inline function lincomb(c1::Number, P1, d1::AbstractArray, c2, P2, d2, args...)
    P_tail, d_tail = lincomb(c2, P2, d2, args...)
    return (c1 .* P1 .+ P_tail, c1 .* d1 .+ d_tail)
end

# For ConservativePDSFunctions: Processes triplets (coeff, P, nothing)
@inline function lincomb(c1::Number, P1, d1::Nothing, c2, P2, d2, args...)
    P_tail, _ = lincomb(c2, P2, d2, args...)
    return (c1 .* P1 .+ P_tail, nothing)
end

function run_comprehensive_benchmark(N)
    println("\n" * "█"^70)
    println("  BENCHMARKING SYSTEM SIZE N = $N")
    println("█"^70)

    # Setup test data (3 stages)
    P1, P2, P3 = rand(SMatrix{N,N}), rand(SMatrix{N,N}), rand(SMatrix{N,N})
    d1, d2, d3 = rand(SVector{N}), rand(SVector{N}), rand(SVector{N})
    c1, c2, c3 = 0.5, 0.3, 0.2

    # --- CASE A: Standard PDS (with destruction vectors) ---
    println("\n>>> CASE A: Standard PDS (d is AbstractArray)")
    
    println("\n[A.1] Direct Input (Manual):")
    b_direct_a = @benchmark ($c1 .* $P1 .+ $c2 .* $P2 .+ $c3 .* $P3, 
                             $c1 .* $d1 .+ $c2 .* $d2 .+ $c3 .* $d3)
    display(b_direct_a)

    println("\n[A.2] lincomb Function:")
    b_func_a = @benchmark lincomb($c1, $P1, $d1, $c2, $P2, $d2, $c3, $P3, $d3)
    display(b_func_a)


    # --- CASE B: Conservative PDS (d is nothing) ---
    println("\n" * "-"^70)
    println(">>> CASE B: Conservative PDS (d is nothing)")
    
    println("\n[B.1] Direct Input (Manual):")
    # In manual case, we only calculate P
    b_direct_b = @benchmark ($c1 .* $P1 .+ $c2 .* $P2 .+ $c3 .* $P3, nothing)
    display(b_direct_b)

    println("\n[B.2] lincomb Function:")
    b_func_b = @benchmark lincomb($c1, $P1, nothing, $c2, $P2, nothing, $c3, $P3, nothing)
    display(b_func_b)
    
    return nothing
end

Results:

julia> run_comprehensive_benchmark(2)

██████████████████████████████████████████████████████████████████████
  BENCHMARKING SYSTEM SIZE N = 2
██████████████████████████████████████████████████████████████████████

>>> CASE A: Standard PDS (d is AbstractArray)

[A.1] Direct Input (Manual):
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
 Range (min … max):  6.135 ns … 183.600 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     7.063 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   7.266 ns ±   3.170 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▅ █ ▇ ▄  ▁                                                 
  ▇██▃█▃█▃█▆▃█▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂ ▃
  6.14 ns         Histogram: frequency by time        14.8 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

[A.2] lincomb Function:
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
 Range (min … max):  5.700 ns … 152.339 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     6.569 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   6.755 ns ±   2.887 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

     █ █     ▁ ▂ ▅ ▆                                           
  ▇▅▅█▄█▄█▆▇▅█▅█▃█▄█▄▃▇▃▇▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▂▁▂▂▁▁▂▂▂▂▂▂▁▁▂▂ ▃
  5.7 ns          Histogram: frequency by time        10.4 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

----------------------------------------------------------------------
>>> CASE B: Conservative PDS (d is nothing)

[B.1] Direct Input (Manual):
BenchmarkTools.Trial: 10000 samples with 1000 evaluations per sample.
 Range (min … max):  4.823 ns … 174.098 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     5.560 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.677 ns ±   2.636 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▅  ▆  ▅  ▂   █                                               
  █▃▄█▃▃█▄▃█▆▃▃█▇▃▃▅▇▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▁▂▁▂▂▂▂▂▂▂▁▁▂▂▁▂▂▁▂ ▃
  4.82 ns         Histogram: frequency by time        9.31 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

[B.2] lincomb Function:
BenchmarkTools.Trial: 10000 samples with 1000 evaluations per sample.
 Range (min … max):  4.823 ns … 70.316 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     5.562 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.758 ns ±  1.852 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▂█▄▇▄▃▇▃▆█▄▃▆▃▂▁▁▁▁▁                                      ▃
  █████████████████████▇▇█▇▇██▇██████▇▇▇▇▇▇▆▇▆▅▆▆▁▅▅▅▅▄▄▄▃▃▄ █
  4.82 ns      Histogram: log(frequency) by time     10.8 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> run_comprehensive_benchmark(3)

██████████████████████████████████████████████████████████████████████
  BENCHMARKING SYSTEM SIZE N = 3
██████████████████████████████████████████████████████████████████████

>>> CASE A: Standard PDS (d is AbstractArray)

[A.1] Direct Input (Manual):
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
 Range (min … max):   9.181 ns … 153.478 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     10.074 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   10.712 ns ±   3.368 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▇▃▆▃▆▂▆▄▁▅▁                                                 ▂
  █████████████▇▇▆▅▆▅▄▆▆▆▇▇▇▅▆▇▇▇▇▆▇▆▆▅▅▆▂▆▅▆▅▅▆▄▅▅▅▅▄▃▃▄▂▄▃▂▃ █
  9.18 ns       Histogram: log(frequency) by time      24.1 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

[A.2] lincomb Function:
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
 Range (min … max):  8.308 ns … 175.174 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     8.814 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   9.438 ns ±   3.640 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▂▇▃▂▆▂▁▅▁  ▇▂▁▄▅                                           ▂
  ███████████▆██████▇▇▆▆▆▄▄▅▄▅▅▃▅▅▃▄▄▄▂▅▅▄▄▅▄▅▃▃▄▄▄▄▃▄▃▄▂▄▂▄▄ █
  8.31 ns      Histogram: log(frequency) by time      16.9 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

----------------------------------------------------------------------
>>> CASE B: Conservative PDS (d is nothing)

[B.1] Direct Input (Manual):
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
 Range (min … max):  6.126 ns … 166.082 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     6.728 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   7.217 ns ±   3.308 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▂▇▁▆▁▅▅▁▇▁▁▅▁▁                                             ▂
  ███████████████▇▇▇▆▅▆▅▅▆▇▇▅▆▅▆▆▆▆▇▇▆▇▇▆▇▆▆▇▅▅▅▅▆▆▅▅▆▇▅▃▄▅▅▄ █
  6.13 ns      Histogram: log(frequency) by time      14.1 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

[B.2] lincomb Function:
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
 Range (min … max):  6.127 ns … 149.530 ns  ┊ GC (min … max): 0.00% … 0.00%

SKopecz · 2026-01-09T10:44:07Z

The following shows the efficiency of the new build_mprk_matrix method for StaticMatrix input.

using BenchmarkTools
using StaticArrays
using MuladdMacro
using Test
using LinearAlgebra

# --- IMPLEMENTATIONS ---
### OLD #################
function build_mprk_matrix_old(P, sigma, dt, d = nothing)
    # re-use the in-place version implemented below
    M = similar(P)
    build_mprk_matrix!(M, P, sigma, dt, d)

    if P isa StaticArray
        return SMatrix(M)
    else
        return M
    end
end
@muladd function build_mprk_matrix!(M, P, sigma, dt, d = nothing)
    # M[i,i] = (sigma[i] + dt * d[i] + dt * sum_j≠i P[j,i]) / sigma[i]
    # M[i,j] = -dt * P[i,j] / sigma[j]
    # TODO: the performance of this can likely be improved
    Base.require_one_based_indexing(M, P, sigma)
    @assert size(M, 1) == size(M, 2) == size(P, 1) == size(P, 2) == length(sigma)
    if !isnothing(d)
        Base.require_one_based_indexing(d)
        @assert length(sigma) == length(d)
    end

    zeroM = zero(eltype(P))

    # Set sigma on diagonal
    @inbounds for i in eachindex(sigma)
        M[i, i] = sigma[i]
    end
    # Add nonconservative destruction terms to diagonal (PDSFunctions only!)
    if !isnothing(d)
        @inbounds for i in eachindex(d)
            M[i, i] = M[i, i] + dt * d[i]
        end
    end

    # Run through P and fill M accordingly.
    # If P[i,j] ≠ 0 set M[i,j] = -dt*P[i,j] and add dt*P[i,j] to M[j,j].
    @fastmath @inbounds @simd for I in CartesianIndices(P)
        if I[1] != I[2]
            if !iszero(P[I])
                dtP = dt * P[I]
                M[I] = -dtP / sigma[I[2]]
                M[I[2], I[2]] += dtP
            else
                M[I] = zeroM
            end
        end
    end

    # Divide diagonal elements by Patankar weights denominators
    @fastmath @inbounds @simd for i in eachindex(sigma)
        M[i, i] /= sigma[i]
    end

    return nothing
end

### NEW #################
@inline function sum_offdiagonal_col(P::StaticMatrix{N, N, T}, j) where {N, T}
    res = zero(T)
    for i in 1:N
        if i != j
            res += P[i, j]
        end
    end
    return res
end
@muladd @inline function build_mprk_matrix_new(P::StaticMatrix{N, N, T}, sigma, dt,
                                               d = nothing) where {N, T}
    return SMatrix{N, N, T}((i == j) ?
                            # diagonal: sum over P[k, i] where k != i
                            (sigma[i] +
                             dt *
                             (sum_offdiagonal_col(P, i) + (d === nothing ? zero(T) : d[i]))) /
                            sigma[i] :
                            # off-diagonal
                            -(dt * P[i, j]) / sigma[j]
                            for i in 1:N, j in 1:N)
end

# --- TEST & BENCHMARK SUITE ---

function run_benchmark(N)
    println("\n" * "█"^70)
    println("  COMPREHENSIVE BENCHMARK N = $N")
    println("█"^70)

    # Setup
    P = rand(SMatrix{N, N})
    sigma = rand(SVector{N})
    d_vec = rand(SVector{N})
    dt = 0.1

    # Validation
    println("Checking Correctness...")
    M_old = build_mprk_matrix_old(P, sigma, dt, d_vec)
    M_new = build_mprk_matrix_new(P, sigma, dt, d_vec)
    M_old_cons = build_mprk_matrix_old(P, sigma, dt, nothing)
    M_new_cons = build_mprk_matrix_new(P, sigma, dt, nothing)

    @test M_old ≈ M_new
    @test M_old_cons ≈ M_new_cons
    println("✓ Results are identical.\n")

    # Benchmarks
    println(">>> CASE A: With destruction vector d")
    println("\n[OLD VERSION]")
    display(@benchmark build_mprk_matrix_old($P, $sigma, $dt, $d_vec))

    println("\n[NEW VERSION]")
    display(@benchmark build_mprk_matrix_new($P, $sigma, $dt, $d_vec))

    println("\n" * "-"^70)
    println(">>> CASE B: Conservative (d = nothing)")

    println("\n[OLD VERSION]")
    display(@benchmark build_mprk_matrix_old($P, $sigma, $dt, nothing))

    println("\n[NEW VERSION]")
    display(@benchmark build_mprk_matrix_new($P, $sigma, $dt, nothing))
end

julia> run_benchmark(5)

██████████████████████████████████████████████████████████████████████
  COMPREHENSIVE BENCHMARK N = 5
██████████████████████████████████████████████████████████████████████
Checking Correctness...
✓ Results are identical.

>>> CASE A: With destruction vector d

[OLD VERSION]
BenchmarkTools.Trial: 10000 samples with 988 evaluations per sample.
 Range (min … max):  39.109 ns …  11.639 μs  ┊ GC (min … max):  0.00% … 99.12%
 Time  (median):     48.098 ns               ┊ GC (median):     0.00%
 Time  (mean ± σ):   72.646 ns ± 214.892 ns  ┊ GC (mean ± σ):  12.98% ±  6.77%

  ▇▇█▅▄▂▂▃▃▂▂ ▆▆▄▃▂▂▁▁▁▁▁▁  ▁                                  ▂
  █████████████████████████████▇▇▇▆▆▅▆▆▆▅▄▆▆▆▆▅▅▅▅▄▅▄▅▃▄▄▄▄▄▄▄ █
  39.1 ns       Histogram: log(frequency) by time       232 ns <

 Memory estimate: 208 bytes, allocs estimate: 1.

[NEW VERSION]
BenchmarkTools.Trial: 10000 samples with 997 evaluations per sample.
 Range (min … max):  13.862 ns … 183.346 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     13.916 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   15.972 ns ±   5.200 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▄▂▂▂▃▃▃ ▁▂▂ ▁▁ ▂ ▁ ▂ ▂                                      ▁
  ████████▇███▆████▆█▆█▆███▆▇▅▅▄▆▆▆▆▆▆▇▇▇▆▇▆▅▄▆▇▆▆▆▅▄▄▃▃▄▃▁▅▅▅ █
  13.9 ns       Histogram: log(frequency) by time      39.9 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

----------------------------------------------------------------------
>>> CASE B: Conservative (d = nothing)

[OLD VERSION]
BenchmarkTools.Trial: 10000 samples with 991 evaluations per sample.
 Range (min … max):  36.367 ns …  11.648 μs  ┊ GC (min … max):  0.00% … 98.85%
 Time  (median):     42.436 ns               ┊ GC (median):     0.00%
 Time  (mean ± σ):   66.981 ns ± 206.688 ns  ┊ GC (mean ± σ):  12.98% ±  6.91%

  ▅█▅▂▂▁▁▂▁   ▄▅▃▂▁▁▁▁▁                                        ▁
  █████████▇▆▇███████████▇█▇▇▆▇▆▆▆▅▆▆▆▅▅▅▅▅▄▄▄▅▅▄▄▄▄▅▃▃▂▄▃▄▃▄▄ █
  36.4 ns       Histogram: log(frequency) by time       232 ns <

 Memory estimate: 208 bytes, allocs estimate: 1.

[NEW VERSION]
BenchmarkTools.Trial: 10000 samples with 997 evaluations per sample.
 Range (min … max):  13.692 ns … 159.564 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     13.731 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   15.496 ns ±   4.104 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▄▃▃▂ ▂▂▁▃▃▁▂ ▁ ▁ ▂ ▁▁▂  ▁  ▂  ▁                             ▁
  █████▇███████▇█▅█▅██████▇█▆▆█▅▅█▆▆▆▅▅▅▆▅▄▄▅▁▅▅▅▅▆▆▇▆▇▆▆▆▅▆▅▅ █
  13.7 ns       Histogram: log(frequency) by time      32.3 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

…ts pass as before.

SKopecz · 2026-01-12T09:18:08Z

The following shows the efficiency of the new lincomb! methods for in-place computations. One is for AbstractArray and the orther for SparseMatrixCSC.

using BenchmarkTools
using StaticArrays
using SparseArrays
using FastBroadcast
using MuladdMacro
using Test

# --- GENERATED IMPLEMENTATION ---

@muladd @generated function lincomb!(dest::AbstractArray, pairs...)
    n = length(pairs) ÷ 2
    expr = :(pairs[1] * pairs[2])
    for i in 2:n
        expr = :($expr + pairs[$(2 * i - 1)] * pairs[$(2 * i)])
    end
    return quote
        @.. broadcast=false dest=$expr
        return dest
    end
end

@muladd @generated function lincomb!(dest::SparseMatrixCSC, pairs...)
    n = length(pairs) ÷ 2
    nz_names = [Symbol("nz_", i) for i in 1:n]
    setup_block = Expr(:block)
    # We need to keep the structural nonzeros of the production terms.
    # However, this is not guaranteed by broadcasting, see
    # https://github.com/JuliaSparse/SparseArrays.jl/issues/190
    for i in 1:n
        push!(setup_block.args, :($(nz_names[i]) = nonzeros(pairs[$(2 * i)])))
    end
    expr = :(pairs[1] * $(nz_names[1]))
    for i in 2:n
        expr = :($expr + pairs[$(2 * i - 1)] * $(nz_names[i]))
    end
    return quote
        $setup_block
        nz_dest = nonzeros(dest)
        @.. broadcast=false nz_dest=$expr
        return dest
    end
end

# --- BENCHMARK FUNCTION ---

function run_comprehensive_benchmark(N)
    println("\n" * "█"^75)
    println("  LINCOMB! FULL PERFORMANCE REPORT (N = $N, 3 Components)")
    println("█"^75)

    c1, c2, c3 = 0.5, 0.8, -0.2

    # --- 1. DENSE CASE ---
    println("\n>>> CASE 1: Standard Dense Arrays (Matrix{Float64})")
    A1, A2 = rand(N, N), rand(N, N)
    A3_old = rand(N, N)
    A3_new = copy(A3_old)

    # Validation
    A3_old .= c1 .* A1 .+ c2 .* A2 .+ c3 .* A3_old
    lincomb!(A3_new, c1, A1, c2, A2, c3, A3_new)
    @test A3_old ≈ A3_new
    println("✓ Correctness: OK")

    println("\n[OLD] Manual Dense @..")
    display(@benchmark $A3_old .= $c1 .* $A1 .+ $c2 .* $A2 .+ $c3 .* $A3_old)

    println("\n[NEW] lincomb! Dense")
    display(@benchmark lincomb!($A3_new, $c1, $A1, $c2, $A2, $c3, $A3_new))

    # --- 2. SPARSE CASE ---
    println("\n" * "-"^75)
    println("\n>>> CASE 2: Sparse Matrices (SparseMatrixCSC)")
    S_ref = sprand(N, N, 0.01)
    S1 = SparseMatrixCSC(S_ref.m, S_ref.n, S_ref.colptr, S_ref.rowval, rand(nnz(S_ref)))
    S2 = SparseMatrixCSC(S_ref.m, S_ref.n, S_ref.colptr, S_ref.rowval, rand(nnz(S_ref)))
    S3_old = SparseMatrixCSC(S_ref.m, S_ref.n, S_ref.colptr, S_ref.rowval, rand(nnz(S_ref)))
    S3_new = copy(S3_old)

    # Validation
    nz1, nz2, nz3 = nonzeros(S1), nonzeros(S2), nonzeros(S3_old)
    @.. broadcast=false nz3 = c1 * nz1 + c2 * nz2 + c3 * nz3
    lincomb!(S3_new, c1, S1, c2, S2, c3, S3_new)
    @test S3_old ≈ S3_new
    println("✓ Correctness: OK")

    println("\n[OLD] Manual Sparse nz-broadcast")
    display(@benchmark begin
        nz_1 = nonzeros($S1)
        nz_2 = nonzeros($S2)
        nz_3 = nonzeros($S3_old)
        @.. broadcast=false nz_3 = $c1 * nz_1 + $c2 * nz_2 + $c3 * nz_3
    end)

    println("\n[NEW] lincomb! Sparse")
    display(@benchmark lincomb!($S3_new, $c1, $S1, $c2, $S2, $c3, $S3_new))
end

julia> run_comprehensive_benchmark(1000)

███████████████████████████████████████████████████████████████████████████
  LINCOMB! FULL PERFORMANCE REPORT (N = 1000, 3 Components)
███████████████████████████████████████████████████████████████████████████

>>> CASE 1: Standard Dense Arrays (Matrix{Float64})
✓ Correctness: OK

[OLD] Manual Dense @..
BenchmarkTools.Trial: 2111 samples with 1 evaluation per sample.
 Range (min … max):  1.824 ms …   4.594 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.084 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.351 ms ± 596.946 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

   █▃                                                          
  ███▇▇▇▅▄▅▅▄▄▃▂▂▂▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▁▂▂▂▁▁▁▁▂▁▁▁▁▁▁▁ ▂
  1.82 ms         Histogram: frequency by time        4.08 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

[NEW] lincomb! Dense
BenchmarkTools.Trial: 2434 samples with 1 evaluation per sample.
 Range (min … max):  1.437 ms …   3.640 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.751 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.035 ms ± 544.079 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

      ▂▁▆██▄                                                   
  ▂▁▂▃████████▇▇▅▄▄▃▃▂▂▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▃▂▂▂▂▃▃▄▄▅▄▃▃▃▃▃▃▃▃▄▄▄▃ ▃
  1.44 ms         Histogram: frequency by time        3.26 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

---------------------------------------------------------------------------

>>> CASE 2: Sparse Matrices (SparseMatrixCSC)
✓ Correctness: OK

[OLD] Manual Sparse nz-broadcast
BenchmarkTools.Trial: 10000 samples with 9 evaluations per sample.
 Range (min … max):  2.446 μs … 26.794 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.944 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.266 μs ±  1.174 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▃▅▅▇█▇▆▄▄▃▃▃▂▃▂▂▂▂▂▃▂▂▁▁                                   ▂
  ███████████████████████████▆█▇▆▇▆▆▅▆▅▆▅▅▅▆▆▄▅▅▄▅▄▄▄▃▃▂▂▄▄▄ █
  2.45 μs      Histogram: log(frequency) by time      8.1 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

[NEW] lincomb! Sparse
BenchmarkTools.Trial: 10000 samples with 9 evaluations per sample.
 Range (min … max):  3.267 μs … 20.178 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     3.845 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.050 μs ±  1.044 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▃▄▁ ▆▅ █                                                   
  ████▇██▆█▆█▃▃▂▂▃▂▂▃▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  3.27 μs        Histogram: frequency by time        8.24 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

SKopecz · 2026-01-14T15:18:58Z

Tried to optimize loops like

for i in eachindex(b)
    b[i] += dt * P[i, i]
end

for sparse matrices P, but there was no benefit.

    end
    return nothing
end

# --- BENCHMARK RUNNER ---

function run_scaling_analysis()
    # Problem sizes (N) from 10 to 100,000
    Ns = [10, 30, 100, 300, 1000, 3000, 10000, 30000, 100000, 300000, 1000000]
    
    times_naive = Float64[]
    times_opt = Float64[]
    dt = 0.1

    println("Starting scaling benchmarks...")

    for n in Ns
        # Setup: 10 non-zeros per column + diagonal
        b = rand(n)
        P = sprand(n, n, 10/n) + 2I
        
        # Benchmark Naive
        t_naive = @belapsed add_diagonal_naive!($b, $P, $dt)
        push!(times_naive, t_naive)

        # Benchmark Optimized
        t_opt = @belapsed add_diagonal!($b, $P, $dt)
        push!(times_opt, t_opt)
        
        println("Completed N = $n")
    end

    # --- PLOTTING ---
    
    p = plot(Ns, [times_naive, times_opt],
        xlabel = "Problem Size (N)",
        ylabel = "Time (seconds)",
        label = ["Naive (P[i,i])" "Optimized (nzrange)"],
        title = "Scaling of Diagonal Addition: Naive vs Optimized",
        xscale = :log10,
        yscale = :log10,
        legend = :topleft,
        marker = :circle,
        lw = 2,
        grid = true
    )
    
    # Save or display
    display(p)
    return p
end

JoshuaLampert

I didn't go through everything in detail (especially the definition of lincomb), but I like the idea of bundling common steps in functions and I trust CI that the replacements are correct. I only have one question about type stability.

src/mprk.jl

ranocha

Thanks, I like the idea of defining functions for commonly-used functionality. However, this PR is quite long, so it will take some time to review everything in detail.
Did you perform some performance benchmarks for full time integrations (with save_everystep = false)?

src/mprk.jl

ranocha · 2026-01-18T15:26:20Z

src/mprk.jl

 end

 #####################################################################
-# out-of-place for dense and static arrays


Can we re-add some comments like this one to inform readers of the code whether we are treating in-place or out-of-place parts? This helps when looking for possible performance issues caused by allocations (for in-place methods).

Added comments for all auxiliary functions.

SKopecz · 2026-01-19T15:19:43Z

Thanks, I like the idea of defining functions for commonly-used functionality. However, this PR is quite long, so it will take some time to review everything in detail.

Take your time. Once this is pushed, implementing additional schemes will be significantly simplified.

SKopecz · 2026-01-19T15:24:46Z

Did you perform some performance benchmarks for full time integrations (with save_everystep = false)?

I'll do this.

Co-authored-by: Joshua Lampert <[email protected]>

created function basic_patankar_step for oop

b3ab1f2

SKopecz marked this pull request as draft December 29, 2025 13:00

revised MPRK43

6229255

SKopecz changed the title ~~WIP: Simplified and clearer code using the function basic_patankar_step.~~ WIP: Simplified and cleaner code using the function basic_patankar_step. Dec 29, 2025

SKopecz added 8 commits December 29, 2025 21:20

revised SSPMPRK oop implementations

71f8d53

format

75e3f63

fix SSPMPRK43

d14efde

bugfix SSPMPRK43

fae4106

more structure, more dispatch

327b15c

bugfix

915b796

revised SSPMPRK schemes

6f0be7d

bugfix

facb30d

SKopecz added 2 commits January 4, 2026 11:58

revised lincomb function

4e1bb6a

added @muladd

1de95f5

SKopecz changed the title ~~WIP: Simplified and cleaner code using the function basic_patankar_step.~~ WIP: Simplified and cleaner code. Jan 9, 2026

build_mprk_matrix for StaticMatrix

5bae76e

SKopecz added 4 commits January 9, 2026 16:19

bugfix build_mprk_matrix for StaticMatrix

505c2d0

build_mprk_matrix for StaticMatrix without slicing allocations

852dcda

format

84ef5d3

Revised implementation of build_mprk_matrix for StaticMatrix. Now tes…

5e611f2

…ts pass as before.

SKopecz added 5 commits January 12, 2026 11:05

lincomb! in MPRK schemes

49d43e3

bugfix MPRK22

2324671

remove stale explicit imports

ed7e8dc

introduced basic_patankar_step! and basic_patankar_step_conservative!

4cc9721

revised sspmprk

302836f

SKopecz added 2 commits January 14, 2026 13:13

removed comments

f066803

add diagonal of sparse matrix efficiently

3249f15

SKopecz and others added 4 commits January 14, 2026 16:19

use naive loop to add diagonal of sparse matrices

5faece7

consolidate broadcasts

27e683f

removed @unpack

0ab817c

Merge branch 'main' into sk/basic_patankar_step

903ad7e

SKopecz marked this pull request as ready for review January 15, 2026 13:45

SKopecz mentioned this pull request Jan 15, 2026

WIP: Linear multistep Patankar schemes #182

Open

7 tasks

SKopecz requested review from JoshuaLampert and ranocha January 15, 2026 21:11

JoshuaLampert requested changes Jan 16, 2026

View reviewed changes

src/mprk.jl Outdated Show resolved Hide resolved

src/mprk.jl Outdated Show resolved Hide resolved

ranocha changed the title ~~WIP: Simplified and cleaner code.~~ Simplified and cleaner code Jan 18, 2026

ranocha requested changes Jan 18, 2026

View reviewed changes

SKopecz and others added 2 commits January 19, 2026 16:25

additional comments in source file

7427043

Update src/mprk.jl

f52a7c7

Co-authored-by: Joshua Lampert <[email protected]>

Simplified and cleaner code #183

Are you sure you want to change the base?

Simplified and cleaner code #183

Conversation

SKopecz commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coveralls commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 21143246237

Details

💛 - Coveralls

Uh oh!

SKopecz commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SKopecz commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SKopecz commented Jan 12, 2026

Uh oh!

SKopecz commented Jan 14, 2026

Uh oh!

JoshuaLampert left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ranocha left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ranocha Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

SKopecz Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

SKopecz commented Jan 19, 2026

Uh oh!

SKopecz commented Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

SKopecz commented Dec 29, 2025 •

edited

Loading

codecov bot commented Dec 29, 2025 •

edited

Loading

coveralls commented Dec 29, 2025 •

edited

Loading

SKopecz commented Jan 4, 2026 •

edited

Loading

SKopecz commented Jan 9, 2026 •

edited

Loading

ranocha left a comment •

edited

Loading